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Using Kalman techniques, it is possible to perform optimal estimation in linear Gaussian state- 
space models. We address here the case where the noise probability density functions are of unknown 

in ■ 

functional form. A flexible Bayesian nonparametric noise model based on Dirichlet process mixtures 
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I is introduced. Efficient Markov chain Monte Carlo and Sequential Monte Carlo methods are then 

developed to perform optimal batch and sequential estimation in such contexts. The algorithms are 
applied to blind deconvolution and change point detection. Experimental results on synthetic and 
' real data demonstrate the efficiency of this approach in various contexts. 
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Particle filter. 

I. Introduction 

Dynamic linear models are used in a variety of applications, ranging from target tracking, system 



X 

identification, abrupt change detection, etc. The models are defined as follows : 



x t = Afxt-i + C t u t + G t v t (1) 
z t = H t x t + w t (2) 

where xo ~ N{hq, So)> x t is the hidden state vector, z t is the observation, v t and w t are sequences 
of mutually independent random variables such that v t '~' F v and w t '~' F w . A t and H t are the 
known state and observation matrices, u t is a known input, Ct the input transfer matrix and G t is the 
state transfer matrix. Let us denote a^j = (a^aj+i, ...,a.j) for any sequence {a t }. The main use of 
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model (fl])-© is to estimate the hidden state x$ given the observations z\. t (filtering, with a forward 
recursion) or z± : t for t <T (smoothing, with a forward-backward recursion). 

It is a very common choice to assume that the noise probability density functions (pdfs) F v and 
F w are Gaussian, with known parameters, as this enables the use of Kalman filtering/smoothing. 
In such a framework, Kalman techniques are optimal in the sense of minimizing the mean squared 
error. There are, however, a number of cases where the Gaussian assumption is inadequate, e.g. the 
actual observation noise distribution or the transition noise are multimodal (in Section [Vl] we provide 
several such examples). In this paper, we address the problem of optimal state estimation when the 
probability density functions of the noise sequences are unknown and need to be estimated on-line 
or off-line from the data. This problem takes place in the class of identification/estimation of linear 
models with unknown statistic noises. 

A. Proposed approach 

Our mefhodolog)0 relies on the introduction of a Dirichlet Process Mixture (DPM), which is used to 
model the unknown pdfs of the state noise v t and measurement noise Wj. DPMs are flexible Bayesian 
nonparametric models which have become very popular in statistics over the last few years, to perform 
nonparametric density estimation [2-4]. Briefly, a realization of a DPM can be seen as an infinite 
mixture of pdfs with given parametric shape (e.g., Gaussian) where each pdf is denoted f{-\9). The 
parameters of the mixture (mixture weights and locations of the #'s) are given by the random mixture 
distribution G(9), which is sampled from a so-called Dirichlet Process. A prior distribution, denoted 
Gq(9) must be selected over the 6>'s (e.g., Normal-Inverse Wishart for the DPM of Gaussians case, 
where 9 contains the mean vector and the covariance matrix), while the weights follow a distribution 
characterized by a positive real-valued parameter a. For small a, only a small fraction of the weights 
is significantly nonzero, whereas for large a, many weights are away from zero. Thus, the parameter 
a tunes the prior distribution of components in the mixture, without setting a precise number of 
components. Apart from this implicit, powerful clustering property, DPMs are computationally very 
attractive due to the so-called Polya urn representation which enables straightforward computation 
of the full conditional distributions associated to the latent variables 9. 

B. Previous works 

Several algorithms have been developed to estimate noise statistics in linear dynamic systems [5-8]. 
However, these algorithms assume Gaussian noise pdfs (with unknown mean and covariance matrix). 

'Preliminary results were presented in Caron et al. [1]. 
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As will be made clearer in the following, this is a special case of our framework: if the scaling 
coefficient a tends to 0, the realizations of the DPM of Gaussian pdfs converge in distribution to a 
single Gaussian with parameter prior distribution given by the base distribution Go- Algorithms have 
also been developed to deal with non-Gaussian noises distributions, such as student-t [9], a-stable [10] 
or mixture of Gaussians [11]. These works are based on a given prior parametric shape of the pdf 
which we do not assume in this paper. 

Though many recent works have been devoted to DPMs in various contexts such as economet- 
rics [12], geoscience [13] and biology [14, 15], this powerful class of models has never been used 
in the context of linear dynamic models (to the best of our knowledge). In this paper, we show that 
DPM-based dynamic models with unknown noise distributions can be defined easily. Moreover, we 
provide several efficient computational methods to perform Bayesian inference, ranging from Gibbs 
sampling (for offline estimation) to Rao-Blackwellized particle filtering for online estimation. 

C. Paper organization 

This paper is organized as follows. In Section III we recall the basics of Bayesian nonparametric 
density estimation with DPMs. In Section HIT] we present the dynamic model with unknown noise 
distributions. In Section [TV] we derive an efficient Markov chain Monte Carlo (MCMC) algorithm to 
perform optimal estimation in the batch (offline) case. In Section [Vj we develop a Sequential Monte 
Carlo (SMC) algorithm/Particle filter to perform optimal estimation in the sequential (online) case. All 
these algorithms can be interpreted as Rao-Blackwellized methods. In Section IVIII we discuss some 
features of these algorithms, and we relate them to other existing approaches. Finally, in Section [VlJ 
we demonstrate our algorithms on two applications: blind deconvolution of impulse processes and a 
change point problem in biomedical time series. The last section is devoted to conclusions and future 
research directions. 

II. Bayesian nonparametric density estimation 

In this section, we review briefly Bayesian nonparametric density estimation^. We introduce Dirich- 
let processes as probabilistic measures on the space of probability measures, and we outline its 
discreteness. Then, the DPM model in presented. 

2 There are many ways to understand 'nonparametric'. In this paper, we follow many other papers in the same vein [2- 
4], where 'nonparametric' refers to the fact that the pdf of interest cannot be defined by a functional expansion with a 
finite-dimensional parameter space. 
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A. Density estimation 

Let yi, ...,y n be a statistically exchangeable sequence distributed with 

y fc ~ F(-) (3) 

where ~ means distributed according to. We are interested here in estimating F(-) and we consider 
the following nonparametric model 

^(y) = / f(y\0)dG(9) (4) 
Je 

where 6 G is called the latent variable or cluster variable, f(-\0) is the mixed pdf and G(-) is the 
mixing distribution. Within the Bayesian framework, it is assumed that G(-) is a Random Probability 
Measure (RPM) [4] distributed according to a prior distribution (i.e., a distribution over the set of 
probability distributions). We will select here the RPM to follow a Dirichlet Process (DP) prior. 

B. Dirichlet Processes 

Ferguson [16] introduced the Dirichlet Process (DP) as a probability measure on the space of 
probability measures. Given a probability measure Go(-) on a (measurable) space (T, .A) and a positive 
real number a, a probability distribution G(-) distributed according to a DP of base distribution Go(-) 
and scale factor a, denoted G(-) ~ DP(Gq(-), a), satisfies for any partition A\, ...,Ak of T and any 

k 

(G(Ai),... ) G(A Jfc ))~2?(aGo(Ai),... J aGo(A fc )) (5) 

where V is a standard Dirichlet distribution, classically defined for a set of random variables (bo, .., b p ) * 
V(a , ..,a p ) by 

lh=o l k a i) l=Q l=0 
where T is the gamma function, and 6 u (v) is the Dirac delta function, which is zero whenever v / u. 
From the definition in Eq. ([5]), it is easy to show that for every B € T 

E[G(B)]=Go(B) (7) 

var [G(B)\ = ° V ^-^ ° l (8) 

An important property is that the realizations of a Dirichlet process are discrete, with probability 
one. One can show that G admits the so-called stick-breaking representation, established by Sethu- 
raman [17]: 

oo 

G(0 = E^MO (9) 
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with Uj ~ Go(-)> TTj = Pj Y[i=i (1 ~~ A) an d /3j ~ £>(l,a) where B denotes the beta distribution. 
In the following, we omit (•) in G(-) and other distributions, to simplify notations. Using Eq. ©, it 
comes that the following flexible prior model is adopted for the unknown distribution F 

CO 

F{y) = Y,v ] f{y\U J ). (10) 

Apart from its flexibility, a fundamental motivation to use the DP model is the simplicity of the 
posterior update. Let 6\, . . . , 9 n be n random samples from G 

fe |G~ d -G (11) 

where G ~ DP(Go,a) then the posterior distribution of G|#i :n is also a DP 

1 n 

G|0i :w ~DP(-^Go + — — YV.a + n) (12) 
a + n a + n r~J 

Moreover, it can be shown that the predictive distribution, computed by integrating out the RPM G, 
admits the following Poly a urn representation [18] 

1 ™ 

e n +i\ei:n ~ — — + ^^g . (13) 

a + n ^ a + n 
k=i 

Therefore, conditionally on the latent variables 6\_ n sampled previously, the probability that a new 
sample is identical to an existing one is overall ^r^, whereas, with probability ^r^, the new sample 
is distributed (independently) according to Go- It should be noted that several 6^'s might have the 
same value, thus the number of "alive" clusters (denoted M), that is, the number of distinct values 
of is less than n. 

The scaling coefficient a tunes the number of "alive" clusters M. For large n, Antoniak [19] 
showed that E [M|a, n] ~ alog(l + As a tends to zero, most of the samples 6^ share the same 
value, whereas when a tends to infinity, the 9k are almost i.i.d. samples from Go- 

C. Dirichlet Process Mixtures 

Using these modeling tools, it is now possible to reformulate the density estimation problem using 
the following hierarchical model known as DPM [19]: 

G ~ DP(Gq, a, ), and, for k = 1, . . . , n 

6 k \G~G, (14) 

y k \8k ~ f(-\e k ) 

It should be noted that DPMs can model a wide variety of pdfs. In particular, assuming Gaussian 
f{-\0k), the parameter contains both the mean and the covariance, and, depending on Go, the corre- 
sponding DPM may have components with large/small variances. 
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D. Estimation objectives 

The objective of DPM-based density estimation boils down to estimating the posterior distribution 
jp(0i:n|yi:7i)> because the probability G can be integrated out analytically by using the Polya urn 
representation. Although DPMs were introduced in the 70's, these models were too complex to handle 
numerically before the introduction of Monte Carlo simulation based methods. Efficient MCMC 
algorithms [2,3,20-22] as well as Sequential Importance Sampling [23,24] enable to sample from 
p(Gi-.n\yi:n)- However, these algorithms cannot be applied to our class of models, which is presented 
below, because the noise sequences v f and w t are not observed directly. 

III. Dynamic Linear Model with Unknown Noise Distribution 

The linear dynamic model defined in Eq.'s CD)-© relies on the unknown noises {v t } and {w t } 
distributions, which are assumed to be DPMs in this paper. 

A. DPM noise models 

For both {v t } and {w(}, the pdf f(-\0) is assumed here to be a Gaussian, denoted A/"(//",E") 
and J\f(fif respectively. The base distributions Gq and Gq are assumed to be normal inverse 
Wishart distributions [25] denoted Gq = AfXW(jJ%,i%,v%,A$) and G^ = J\fTW(fJ,™, kg, vff,Ag). 
The hyperparameters ip v = {/j,q, Kq, Uq, Aq} and xjj w = {pi™ , k,q , v™, Aq } are assumed fixed but 
unknown. Finally, the scale parameters a" and a w are also assumed fixed and unknown. Overall, the 
sets of hyperparameters are denoted cjf = {a v ,ip v }, (f)' w = {a w , i)j w } and cj) = {(j) v , 4> w }. For the sake 
of presentation clarity, we assume that these hyperparameters are known, but in Subsection IIV-BI we 
address the case of unknown hyperparameters by defining priors and a specific estimation procedure. 

To summarize, we have the following models 



DP(G5,a"), G w \cp w ~ DP(G%,a w ), (15) 



and for t = 1, 2, 



i.i.d. _„ nim^in i-id- 



ev\G v ~ g u , ey\G w ~ 

... ' ' a (16) 

■v t \n ~ M(fit E?). w t \9? ~ d N{iif, Sf ). 

where B\ = {/4",£j} (resp. Of = {fif,Y,f}) is the latent cluster variable giving the mean and 

covariance matrix for that cluster, and 8 t = {6f9f}. This model is written equivalently as \ t ~ 

F v (v t ) and w t ~ F w (w t ) where F v and F w are fixed but unknown distributions written as 

F v (v t ) = J A/-(v t ;/i,£)dG>,£), (17) 
F w (w t ) = J A/"(w t ; fi, S)dG w (/i, S) (18) 
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In other words, F v and F w are countable infinite mixtures of Gaussian pdfs of unknown parameters, 
and the mixing distributions G v and G w are sampled from Dirichlet processes. 

B. Estimation of the state parameters 

In this work, our objective is to estimate G v and G w as well as the latent variables {9t} and state 
variable {x t } at each time t, conditional on the observations {z t }. In practice, only the state variable is 
of interest - G v , G w and {9 t } are nuisance parameters. Ideally, one would like to estimate online the 
sequence of posterior distributions p{x$-t \zi-.t, <j>) as * increases or the offline posterior j>(xo : t|zi : t, <ft), 
where T is the fixed length of the observation sequence zi.-t- Thanks to the Polya urn representation, 
it is possible to integrate out analytically and G w from these posteriors. The parameters 0\-t and 
Q\-T remain and the inference is based upon p(^-o-.t, Oi-.t\^i-.t, 4 1 ) or p( x 0:T> 9\-t\zi-t, 4>)- The posterior 
p(xo : t, 0\-t\z\-t, <f>) satisfies for any t 

p(x 0:t , 6>l : t|zi : t, 0) = p(x ;t\0i :t , Z 1:t , 0)p(01:t|zi :t , 0). (19) 

Conditional upon 9 t , Eq.'s (Q])-© may be rewritten as 

x t = F tXt _i + u' t (9 t ) + G t V t (9 t ) (20) 
z t = H t x t + tf+W t (9 t ) (21) 

where u' t (9t) = Cju t + GtfJ% and fif are known inputs, v' t (9 t ) and ~w' t (9 t ) are centered white 
Gaussian noise of known covariance matrices and Tif, respectively. Thus p(x.Q- t \9± : t, zi-.t, </>) (resp. 
j>(xo:t|#1:T, %v.t, 4>)) * s a Gaussian distribution whose parameters can be computed using a Kalman 
filter (resp. smoother) [26] for given 9\- t (resp.#i : T). 

One is generally interested in computing the marginal MMSE state estimate x£j£f E = E [x t |zi :t '] 
(with t' = t or f = T) 

= J XtP(^t\9l:t',Zl :t ',(j))p(9l : t'\z 1 . tl ,(j))d(xt,9 1 :t') (22) 
= y Xt\t>(9l:t>)p(9 1: t'\z,l:t',<t>)d9l :t i 

where x t \ t {9i-t) (resp. x^^i^)) is the mean of the Gaussian p(x-t\9i-.ti z i-.t, 4>) (resp. p{xt\6\-T, 'Zi-.t, 4>)) 
Both x t | t (^i : t) and x t \j<(9i:t) are computed by the Kalman filter/smoother, see Sections Hvl and Ivl 
below. 

Computing these estimates still requires integration w.r.t. the 8's, see Eq. (l22l . This kind of 
integral is not feasible in closed-form, but it can be computed numerically by using Monte Carlo 
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integration [27]. Briefly, assume that a set of N weighted samples {9i. t }i=i ) „ n with weights w\ 
are distributed according to p(#i : t|zi : t, 0), then e.g., x^ SE is computed as 



In Eq. (T23T ). the main difficulty consists of generating the weighted samples {#}^}j=i,...,7v from the 
marginal posterior jp(#i : t|zi : t, 0) (and similarly, from p(0i : t|zi : t, <p) in the offline case). 

• For offline (batch) estimation (t = T), this can be done by MCMC by building a Markov chain 
of samples {0^) T }i^i ^ with target distribution p(0i : t\zi-.t , 4>) ( m that case, = 1/N). The 
MCMC algorithms available in the literature to estimate these Bayesian nonparametric models - 
e.g. [3,21] - are devoted to density estimation in cases where the data are observed directly. They 
do not apply to our case because here, the sequences {v t } and {w t } are not observed directly. 
One only observes {z t }, assumed to be generated by the dynamic model ([I])-©. Section HV1 
proposes an MCMC algorithm dedicated to this model. 

• For online (sequential) estimation, samples can be generated by sequential importance sampling, 
as detailed in Section Pvl 



In this Section, we consider the offline state estimation. As outlined above, this requires to compute 
estimates from the posterior p(xo : T, Oi-.t\zi-.t), where we recall that t = {9%, 9f} = {//", Y% , fif, Y,f} 
is the latent variable as defined above. We first assume that the hyperparameters are fixed and known 
(Subsection IIV-AI ). then we let them be unknown, with given prior distributions (Subsection IIV-Bb . 

A. Fixed and known hyperparameters 

In this subsection, the hyperparameter vector <f> is assumed fixed and known. The marginal posterior 
p(0i:T|zi:T, 4>) can be approximated through MCMC using the Gibbs sampler [27] presented in 
Algorithm 1 below. 

Algorithm 1: Gibbs sampler to sample from p{9\:t\z>v.t , 4>) 

• Initialization : For t = 1,...,T, sample 0f' from an arbitrary initial distribution, e.g. the prior. 
. Iteration i, i = 2, . . . , N' + N: 



- For t = 1, . . . , T, sample 6$ ~ p(0 t \z 1:T , 0% 0) where 6® t = {6>f \ .., 9^%, offi, .., 9 { ^ 1] } 



To implement Algorithm 1, one needs to sample from the conditional pdf p(0 t \z\-T,O-t,4>) f° r 
each of the N' + N iterations (including N' burn-in iterations). From Bayes' rule, we have 



A 1 




(23) 



i=i 



IV. MCMC ALGORITHM FOR OFF-LINE STATE ESTIMATION 



p(9 t \z 1:T , 9- t , 4>) oc p(zi :T \B 1 . T )p(9 t \9- t , 4>). 



(24) 
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where p(6 t \9_ t , <ft) = p(&t \@-u 4> v )p{9t\Q™ti < P W )- From the Polya urn representation, these two terms 
are written as (for w, replace v with w below): 



mot*?) = a , + 1 T _ 1 E **iW) + aV ° T _ 1 ww\r), (25) 

ft — 1 j/l y— i 

Thus p(^t|zi : T, 0) can be sampled from with a Metropolis-Hastings (MH) step, where the 
candidate pdf is the conditional prior p(9t\9-t, 4>)- The acceptance probability is thus given by 

) = m 'T ^WMJ ] 

where 9^* is the candidate cluster sampled from p(9t\9-t,4>). 

The computation of the acceptance probability requires to compute the likelihood p(zi : t|#| , 0^). 
This can be done in 0(T) operations using a Kalman filter. However, this has to be done for t = 
1, . . . , T and one finally obtains an algorithm of computational complexity 0(T 2 ). Here, we propose 
to use instead the backward-forward recursion developed in [28], to obtain an algorithm of overall 
complexity 0(T). This algorithm uses the following likelihood decomposition obtained by applying 
conditional probability rules to p(zx : t-i,zt, z t+i:T\9l:T) 

p(zi:t\6ut) =p{^v.t-i\9v.t-i)p{^t\9i:u^v.t-i) / p(zt +1:T |x*, t +i:T)p(xt |zi:t, 9 1:t )dx. t (27) 

Jx 



with 

(28) 



p(z t:T |x t _i,(9 t:T ) = / p(z f+ i : T|x t _ 1 ,0 t:T )p(zt,x t |6/ f ,Xf_i)dx t 
Jx 

The first two terms of the r.h.s. in Eq. (I27T ) are computed by a forward recursion based on the 
Kalman filter [28]. The third term can be evaluated by a backward recursion according to Eq. (|28T ). 

It is shown in [28] that if f v p(zt-rlx/_i , 9t-r)dx.t-\ < oo then -7. — P( z *-^l x *-i' e *-^) j s a Gaussian 

distribution w.r.t. Xt_i, of mean (0^) and covariance ^'_i| t (0fcT)- Even if p(z 4: r|x t _i, t: r) 

is not integrable in Xt_i, the quantities P^Zi^t-.T) and i 3 t / Zi|t(^:T)'7ij_ 1 | t (^fcT) satisfy the backward 
information filter recursion (see Appendix). Based on Eq. (1271 ). the density p(Ot\zi:T>0-ti < f ) ) * s 
expressed by 

p(9 t \zi- T ,9-t) oc p(9t\9-t,^)p(zt\9i :t ,z 1 ..t-i) / p(zt+i : T|x t , 0t+i:T)l>( x *l z i:t> 9 1 ;t)dx t (29) 

Algorithm 2 summarizes the full posterior sampling procedure. It is the step-by-step description 
of Algorithm 1 that accounts for the factorization of the likelihood given by Eq. (l27l ). 

Algorithm 2: MCMC algorithm to sample from p(0i : t|zi : t> 4>) 
Initialization i = 1 

. For t = 1, ...,T, sample t (1) . 

Iteration i, i = 2, . . . , N' + N 
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• Backward recursion : For t = T,..,l, compute and store P^^ii^t+i-T) an ^ ^t\t+i^ t+i-T) m t\t+i^t+i T) 

• Forward recursion : For t = 1, .., T 

- Perform a Kalman filter step with 6 t = ef store %\t{9^ t _ 1 ,6^~ Vi ) and S t |t(^i2-n ^ 

- Metropolis-Hastings step : 

* Sample a candidate cluster 

e?> ~ P (e t \e%ct>) oo) 

* Perform a Kalman filter step with t = of ] * , store Xt\t(0il-i, $ } *) and E t|t(^2-i> ^P") 

* Compute 



= mm [ 1, ^6tt4^J (3D 



* With probability p(0 t W , 0f } *), set t (i) = 0f } *, otherwise of ] = df l) . 



State post-Sampling (for non-burn-in iterations only) 



. For i = N' + 1, ...,N' + AT, compute x 4 | T (0^) = E fx t |0^, zi :T J for all i with a Kalman 
smoother. 

It can be easily established that the simulated Markov chain is ergodic with limiting 

distribution p(9i-t\^1:t)- After N' burn-in, the N last iterations of the algorithm are kept, and the 
MMSE estimates of 6 t and Xt for all t = 0, . . . , T are computed as explained in Subsection IIII-Bi 
using 

N'+N N'+N 

ZlMMSE _ J /)(«) ~MMSE _ " ~ //)(«) \ /inx 

U t\T ~ 2^ °t X t\T ~ 2^ X *|T1^1:t) ( i2 ) 

i=N'+l i=N'+l 

B. Unknown hyperparameters 

The hyperparameters in vector (j) have some influence on the correct estimation of the DPMs F v 
and F w . In this subsection, we include them in the inference by considering them as unknowns with 
prior distributions: 

a»~0(2,£), a «~0(2,il), (33) 

r ~ Mr), v-m™) (34) 

where 77 and 1/ are known constants and po is a pdf with fixed and known parameters. The posterior 
probability p{a v \x\ : T, 0\-t, %i:T, ip v , 4> w ) reduces to p(a v \M v ,T) where M v is the number of distinct 
values taken by the clusters 0\. T . As shown in [19], this pdf can be expressed by 

s(T M v )(a v ) MV 

p(a v \M v ,T) oc l T ' A ' p(a v ) (35) 
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where the s(T, k) are the absolute values of Stirling numbers of the first kind. We can sample from 
the above pdf with a Metropolis-Hasting step using the prior Gamma pdf p{a v ) = G(%, f ) as proposal 
(and similarly for a w ). Other methods have been proposed that allow direct sampling, see for example 
West [29], and Escobar and West [21]. 

The posterior probability p(ip v \jci : t, Q\-.t, %i:T, ot v , <p w ) reduces to p{ip v \0\.' M v) where 0\.' Mv is the 
set of distinct values taken by the clusters 9\, T . It is expressed by 

M v 

p(^|x 1:T , 1:T , zi !T , a\ 4> w ) cx p (^) H G v (9 v k V) (36) 

k=l 

We can sample from this pdf with a Metropolis-Hasting step using the prior Gamma pdf po{^ v ) as 
proposal whenever direct sampling is not possible. 

V. Rao-Blackwellized Particle Filter algorithm for online state estimation 

Many applications, such as target tracking, require online state estimation. In this case, the MCMC 
approach is inadequate as it requires availability of the entire dataset to perform state estimation. In 
this section, we develop the online counterpart to the MCMC procedure presented in Section Jvj a 
sequential Monte Carlo method (also known as particle filter) is implemented, to sample on-line from 
the sequence of probability distributions {p(xo : t, #i : t|zi :t ), t = 1,2,...}. Here, the hyperparameter 
vector 4> is assumed to be known, therefore it is omitted in the following. Online hyperparameter 
estimation is discussed in Section I VII I 

As explained in Subsection IIII-BI we need to sample from p{0\-t\^i-t), because p(xo^|^irt) z i:t) 
can be computed using Kalman techniques. (The sampling procedure is indeed a generalization of the 
Rao-Blackwellized particle filter [30] to DPMs.) At time t, p(xt, 9i-.t\^i:t) is approximated through a 
set of N particles 6^ t , . . . , by the following empirical distribution 

N 

P N (Xt,Ol:t\zi:t) = Y, W t ] M(Xt;%\Ml),X t \ t (efl)) (37) 
i=l 

The parameters x t i t (0^) and are computed recursively for each particle i using the Kalman 

filter [26]. In order to build the algorithm, we note that 

p(^l|z 1: Oocp(^l^z 1:t _ 1 )p(z t |0g,z 1: ^ 1 )p(^VSli) (38) 

where 

= AA(z t ;z t|i „ 1 (^),5 t | t „ 1 (0g)) 



DRAFT 



and 



V-i(0$) = -Ht 
St\t-i(fikt) = H t 



(39) 



The Rao-Blackwellized Particle Filter (RBPF) algorithm proceeds as follows. 
Algorithm 3: Rao-Blackwellized Particle Filter to sample from p(0i : t|zi:t) 



At time . 

. For % = 1, .., N, sample (x^, E^ 



Set w 



(i) 




iV 



Po(x | , ^o|oJ 



At each time t (t > 1), do for i = 1, 
. Sample 0? ~ q(9 t \e^. l3 z w ) 

. Compute {Xtit-xC^S-i, ^°)» ^.xC^g.!, Sj^), StitC^S-!, ^°), ^(^g^j, flj^)} by using a 

Kalman filter step from {x^i^g.i), £t-i|i-i(0i2_i), ^ *t)} 
• For i = 1, . . . , N, update the weights according to 



u>; aw; 



J t-i 



. Compute S = and for i = 1, . . . , N, set wf> <- ^ 



-(i) 



(40) 



Compute iV e ff 



•h; 



(0 



• If iV e ff < rj, then resample the particles - that is, duplicate the particles with large weights are 

(i) 

remove the particles with small weights. This results in a new set of particles denoted 6\ with 
weights w t = 

• Otherwise, rename the particles and weights by removing the ^s. 

Particle filtering convergence results indicate that the variance of the Monte Carlo estimates depends 
highly on the importance distribution selected. Here, the conditionally optimal importance distribution 
is q(9t\0 ( f\_ l , zi : t) = p(9t\9i) t _i,zi:t), see [30]. However, it cannot be used, as the associated 
importance weights do not admit a closed-form expression^. In practice, the evolution pdf p{9t\9\..t-i) 
was used as the importance distribution. 



From the particles, the MMSE estimate and posterior covariance matrix of x t are given by 



N 



^t\t 



N 
i=l 



^t\t(Pl:t) + ( x t|H y i:J ~ X t\t A X t|H y i:tJ ~ X t\t ) 



(41) 

(42) 



When using the optimal importance distribution, the weights computation requires the evaluation of an integral with 
respect to 8t- It is possible to integrate analytically w.r.t. the cluster means fj, v and fj, w , but not w.r.t. the covariances. 
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VI. Applications 

In this section, we present two applications of the above model and algorithm^ We address, 
first, blind deconvolution, second, change point detection in biomedical time series. In each case, we 
assume that the statistics of the state noise are unknown, and modelled as a DPM. 

A. Blind deconvolution of impulse processes 

Various fields of Engineering and Physics, such as image de-blurring, spectroscopic data analysis, 
audio source restoration, etc. require blind deconvolution. We follow here the model presented in [31] 
for blind deconvolution of Bernoulli-Gaussian processes, which is recalled below. 

1) Statistical Model: Let H = ( 1 hi .. Kl ) = ( 1 andx t = vt vt~i ... v t ~L ) • 

The observed signal z t is the convolution of the sequence x t with a finite impulse response filter H, 
observed in additive white Gaussian noise wt- The observation model is then 

Zt = Hx t + w t (43) 

where wt ~ Af(0,a^) with <r^ is the assumed known variance of wt- The state space model can be 
written as follows: 

x t = Fxt-i + Gv t (44) 

/ OixL \ / 1 \ 

where F = , G = , mX n is the zero matrix of size m x n and I m is 

\ Olxi II J \ Olxi / 

the identity matrix of size m x m. The state transition noise vt is supposed to be independent from 

Wt, and distributed according to the mixture 

v t ~ \F V + (1 - A)<5 (45) 

where 5o is the Dirac delta function at and F v is a DPM of Gaussians defined in Eq. (fTTT ). In other 
words, the noise is alternatively zero, or distributed according to a DPM of Gaussians. 

For simplicity reasons, we introduce latent Bernoulli variables r t G {0, 1} such that Pr(r 4 = 1) = A 
and vt\(r t = 1) ~ f('\^t)> v t\( r t = 0) ~ <5o- Consider the cluster variable ip\ defined by ip\ = Q\ 
if rt = 1 and = (0, 0) (i.e. parameters corresponding to the delta-mass) if r t = 0, that is, 
(Pt ~ \F V + (1 — A) (5( o)- By integrating out F v , one has 

$\<p! t ~ \p(<p v t \<pl t ,r t = 1) + (1 - A)5 (0 ,o) (46) 

4 See Caron et al. [1] for an application on a regression problem. 
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where p((p^\if^ t ,r t = 1) is the Polya urn representation on the set (p v _ t = {93 6 tpl. t \^p 7^ <5(o,o)} of 
size T' given by 

^|(^_ t ,r t = l) J r -^ (47) 



The probability A is considered as a random variable with a beta prior density p(A) = r) 
where £ and r are known parameters. The random variable A can be marginalized out in Eq. (|46|) 

v\ v a(ip v _ t ) bCP-t) x /a o\ 

* '"-< ~ .(^ 4-^) ^ lv - ri = 11 + ^rng^ > (48) 

where 

T 

a(^_ t )=(+ £ r fc (49) 

T 

6(^) = t+ £ (l-r fc ) (50) 

where r t = if 9?^ = (0, 0) and r t = 1 otherwise. 

The hyperparameters are cj) = (a v , h) (the hyperparameters of the base distribution Gq are as- 
sumed fixed and known). These hyperparameters are assumed random with prior distribution p{4>) = 
p(a v )p(h), where 

p(a v ) = g(^), p(h)=N(0,a 2 w Z h ) (51) 

where r], v and £h are known. Conditional on xo : t, the following conditional posterior is obtained 
straightforwardly 

p(h\ X0:t ,z 1 .. T )=N(m,a 2 w Z' h ) (52) 

where 



S h 1 = S h* + Vt-l:t-Lv't-l:t-L 



t=l 

T 



m = T,' h v t _i :t _£ - u f ) 



(=1 



Samples can be generated from the Gaussian posterior p(xo:t\ t Pi.j< ' > zi : t,0^ -1 ^) with the 
simulation smoother [32]. This algorithm complexity is 0(T). 

The aim is to approximate by MCMC the joint posterior pdf p(v\ : T, <fii-.T, 4>\ z i--t)- This is done 
by implementing Algorithm 3 for the cluster variable, whereas the other variables are sampled by 
Metropolis-Hastings or direct sampling w.r.t their conditional posterior. 



DRAFT 



13 

2) Simulation results: This model has been simulated with the following parameters: T = 120, 
L = 3,h=( -1.5 0.5 -0.2 ),A = 0.4, a 2 w = 0.1,i^ = 0.7Af(2, .5)+0.3A/"(-l, .1), S h = 100, 
r] = 3, v = 3, C = 1> r = !• The hyperparameters of the base distribution are /io = 0, ko = 0.1, 
z^o = 4, Ao = 1. For the estimation, 10,000 MCMC iterations are performed, with 7,500 burn-in 
iterations. Fig. Q] (top) displays the MMSE estimate of -v 1:T together with its true value. As can be 
seen in Fig. Q] (bottom), the signal is correctly estimated and the residual is quite small. Also, as can 
be seen in Fig. |2l the estimated pdf F v is quite close to the true one. In particular, the estimated 
pdf matches the two modes of the true pdf. Multiple simulations with different starting values were 
runned, and the results appeared insensitive to initialization. This suggest that the MCMC sampler 
explores properly the posterior. 

Estimated signal 




100 200 300 400 500 

Time index 



4 - Residual between the true and estimated signals] 




100 200 300 400 500 

Time index 



Fig. 1. Top picture: True (dashed line) and MMSE estimated (solid line) signal vi : t after 10,000 MCMC iterations (7,500 
burn-in), vt is supposed to be either with probability A, or to be distributed from an unknown pdf F v with probability 
(1 — A). Bottom picture: residual e t — v t — E[vt\zi-.T] between the true and estimated signals. Although the distribution 
F v is unknown, the state v t is almost correctly estimated. 



Let &mse be the mean squared error (MSE), computed by 



e-MSE 



?]>>-v™ SE ) 2 (53) 

t = l 

To better highlight the performance of the proposed algorithm, we compared our model/algorithm 
(denoted Ml) with the following models, denoted M2 to M8: 

M2. In this model, the pdf is assumed known and set to the true value F v = 0.7A^(2, .5) + 
0.3AA(— 1, .1). The model is simply a Jump Linear Model that jumps between three modes 
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-5 5 



Fig. 2. (Top) True (dashed line) and estimated (solid line) pdf F v . The true pdf F v is a mixture of two Gaussians 
0.7A/"(2, .5) +0.37V(-1, .1). It is supposed to be unknown and jointly estimated with the state vector with 10, 000 MCMC 
iterations (7,500 burn-in) given a vector of 120 observations zu- The estimated pdf matches correctly the two modes of 
the true distribution. (Bottom) Histogram of the simulated values v, sampled from F v which a mixture of two Gaussians 
0.7AA(2, .5) +0.37V(-l,.l) 

of resp. mean/covariance (0,0), (2, .5) and (—1, .1) with resp. prior probabilities (1 — A), 
0.7A and 0.3A. 

M3. In this model, the pdf is assumed to be a Gaussian J\f (1.1, 2.3). The first two moments of 
this Gaussian are the same as those of the true pdf F v . The model is also a Jump Linear 
Model that jumps between two modes of resp. mean/covariance (0,0) and (1.1,2.3) with 
resp. prior probabilities (1 — A) and A. 

M4-7. The model described in this article but with a v fixed to 0.1 (M3), 1 (M4), 10 (M5) and 100 
(M6). 

M8. The model described in this article (Ml) but with the observation noise variance estimated 
with an inverse gamma prior a 2 w ~ iQ(u, v) with u = 2 and v = 0.1. at, ^' is sampled with 
Gibbs sampling with <7^|xo : t, zv.t, h ~iQ(v! ,v') and u' = u + \ and v' = v + \ Ylt=i( z t ~ 
H* t f. 

The algorithm used for M2 and M3 is the Gibbs sampler with backward forward recursion given 
in [28]. For the same set of observations, each MCMC algorithm has been run with 10,000 iterations 
and 7,500 burn-in iterations. MMSE estimate v^ 185 and MSE eusE are computed for each model. 
20 simulations have been performed; for each model, the mean and standard deviation of the MSE's 
over the 20 simulations are reported in Tab. I. 
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Fig. 3. Evolution of a" w in function of Gibbs sampler iteration i. The value of at" is initialized at 100. 
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I -o.2 ¥pw»f^ 



8000 10000 



m 
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Fig. 4. Evolution of the three components of the vector h' 1 ' in function of Gibbs sampler iteration i. It is initialized at 
[0 0]. The value converges toward the true value h = [-1.5 0.5 — 0.2]. 



Tab. I. Comparison of our model/algorithm with other models 



Simulation / Model 


Ml 


M2 


M3 


M4 


M5 


M6 


M7 


M8 


Mean 


0.240 


0.217 


0.290 


0.915 


0.254 


0.253 


0.314 


0.438 


Standard deviation 


0.067 


0.058 


0.085 


0.818 


0.062 


0.086 


0.222 


0.421 



Our model/algorithm (Ml) gives MSE that is only 10% more than that of the model with fixed pdf 
(M2) even though the pdf is not exactly estimated. If the observation noise variance is unknown 
and has to be estimated (M8), this has an impact on the estimation of the state vector still the sampler 
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converge more slowly to the true posterior. If the unknown pdf is set to be a Gaussian with large 
variance (M3), the MSE is 17% larger than with our approach. The estimation of a" improves the 
estimation of the state vector: MSEs are higher for models M4-7 where a v is set to a fixed value. 
This is especially true for a v = 0.1. With this small value, the sampler proposes new clusters very 
rarely and converges very slowly to the true posterior. 

B. Change-point problems in biomedical time series 

Let now consider a change-point problem in biomedical time series. The following problem has 
been discussed in [33] and [11]. Let consider patients who had recently undergone kidney transplant. 
The level of kidney function is given by the rate at which chemical substances are cleared from the 
blood, and the rate can be inferred indirectly from measurements on serum creatinine. If the kidney 
function is stable, the response series varies about a constant level. If the kidney function is improving 
(resp. decaying) at a constant level then the response series decays (resp. increases) linearly. 

1) Statistical model: The linear model, formulated by Gordon and Smith [33] is given by 

x t = F Xi _! + Gvt (54) 
z t = Hx t + w t (55) 



where xj = (m f , rhf), where m t is the level and rh t the slope, F = , G = , z t 









( 1 ) 


I'H 








v o i J 



is the measured creatinine and H = ( 1 J ■ Measurements are subject to errors due to mistakes in 
data transcription, equipment malfunction or blood contamination, wt follows the following mixture 
model 

w t ~ \ W M{Q, af ) + (1 - X w )M(0, a%) (56) 

where \ w = 0.98 is the probability that the measurements are correct, in that case the variance is 
af = 10~ 7 and a™ = 1 otherwise. To capture the effects of jumps in the creatinine level, the state 
noise v 4 is supposed to be distributed according to the following mixture model 

v t ~ X V F V + (1 - \ v )6 e . (57) 

f / nT / \ 1 
where #o = \ I ) > >, A 1 ' = 0.15 is the probability of jump in the level and F v 

r y v ° ° / 1 

is a DPM of Gaussians. Contrary to the model in [11], we do not define fixed jump levels. These 
levels, as well as their number, are estimated through the DPM. 
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Estimated creatinine level 

Measure 











20 40 60 80 100 120 140 

Time 



Fig. 5. Measured (cross) and estimated (solid line) creatinine level with 2000 MCMC iterations and 1000 burn-in iterations. 




100 120 140 



Fig. 6. Posterior probability of a jump in the creatinine level with 2000 MCMC iterations and 1000 burn-in iterations. 
For a threshold set to 0.5, the creatinine level experiences jumps at about times 8, 20 and 110. 



2) Simulation results: The last model is applied to the data provided in Gordon and Smith [33] (and 





also exploited in [1 1]). The hyperparameters of the base distribution are hq 
1 



10 6 , 







4, A 



io- 



1 



For the estimation, 2,000 MCMC iterations (with 1,000 burn-in iterations) 

are performed. Fig. [5] presents the estimated creatinine level together with the measurements. Fig. [6] 
plots the posterior probability of a jump in the creatinine level. In particular, the estimated pdf matches 
the two modes of the true pdf. Multiple simulations with different starting values were runned, and the 
results appeared insensitive to initialization. This suggest that the MCMC sampler explores properly 
the posterior. 

The estimation have also been made online with the Rao-Blackwellized algorithm with 1000 
particles. We perform fixed-lag smoothing [34] to estimate E(x t |zi :t+ T), where T is set to 10. The 
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mean time per iteration is about Is. The importance function used to sample the latent variables 6% 
is prior pdf p{9'^\6\, t _ 1 ). For a detection threshold set at 0.5, the MCMC algorithm detects 3 peaks, 
while the RBPF only detects two peaks. The trade-off between false alarm and non detection may 
be tuned with the coefficient \ v . 



x 10" 3 




20 40 60 80 100 120 140 

Time 



Fig. 7. Measured (cross) and estimated (solid line) creatinine level with a Rao-Blackwellized particle filter with 1000 
particles. 



0.9 
0.8 
0.7 




Fig. 8. Posterior probability of a jump in the creatinine level with the Rao-Blackwellized particle filter with 1000 particles. 
For a threshold set to 0.5, the creatinine level jumps are detected at about times 8 and 110. 

VII. Discussion 
In this section, we discuss several features of the approach proposed. 

A. About Dirichlet Process-based modeling 

DPMs have several main advantages. Firstly, sampling from the posterior distribution is made 
especially easy thanks to the Polya urn scheme. Second, the discreteness of the distribution G enables 
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straightforward estimation of the "number of components", without requiring reversible jump-like 
computational approaches. This discreteness has, however, some unexpected effects on inferences, 
which are reported in [35] and [36]. For example, the DP tends to favor a misbalance between the 
size of the groups of latent variables associated to the same cluster, and to concentrate the posterior 
distribution of the number of groups on a higher value. Dirichlet Processes realize nevertheless 
an attractive trade-off between versatile modeling properties and implementation advantages, which 
explain their success in various contexts - and our choice to use them in this paper. 

B. About MCMC algorithms for DPMs 

As stated in [3], the "single-site" marginal algorithm used in this paper may be stuck in a mode 
of the posterior: several noises samples v t (resp. w t ) are associated to the same cluster value UJ for 
some j in Eq. © (resp. UJ 1 ,) - in other words, there are many i's such that 0\ = UJ for some j (resp 
Of = UJ1). Since the algorithm cannot change the value of Q\ for more than one v t simultaneously, 
changes to Q\ occur rarely, as they require passage through a low-probability intermediate state in 
which noises v t in the same group are not associated to the same cluster. In alternative algorithms, 
such as those given in [3], clusters are sampled in groups, which avoids this problem at the expense 
of an increased computational cost. Nevertheless, we have demonstrated empirically in Section [Vl] 
that our MCMC scheme is indeed efficient in the applications presented. 

C. About the hyperparameter estimation in the MCMC algorithm 

As shown in the applications section, the estimation of the hyperparameter a improves the overall 
state estimation. It also makes the convergence of the Gibbs sampler faster. During the first iterations, 
the value of a is high, and the sampler proposes new clusters more easily. This enables efficient state 
space global exploration during the first iterations. When the "good" clusters have been found, the 
value of a decreases, and it eliminates useless clusters. 

D. About the convergence of the Rao-Blackwellized particle filter 

Because the DPMs F" and F' w are static (infinite-dimensional) parameters, the Rao-Blackwellized 
particle filter suffers from an accumulation of errors over time. In other words, the particle filter is not 
able to move cluster values UJ's and UJ after they are initialized. This is a well known problem of 
static parameter estimation with particle filters. However, as the static component is not the estimated 
cluster 6 t but its prior distribution G, this accumulation is less critical than with the estimation of 
true static parameters. 
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In Section |V] the hyperparameter vector (j> is assumed fixed, also because this is a static parameter. 
It could actually be estimated by implementing one of the particle filtering approaches to static 
parameter estimation. For example, the approaches in [37-40] are based on either kernel density 
methods, MCMC steps, or Maximum Likelihood. However, these algorithms also have important 
drawbacks (error accumulation with time in 0{t 2 )). An alternative solution consists of introducing 
an artificial dynamic on the hyperparameters [41] but it is not applicable to our problem: we would 
then loose the Polya urn structure given by Eq. (fT3l ). 

E. About related approaches 

Our model has some connections with Jump Linear Systems (JLS) [42,43]. In JLS, a discrete 
indicator variable switches between a (known) fixed number of different (known) linear Gaussian 
models with some (known) prior probability. Our model may be interpreted as a JLS whose number 
of different models is unknown, mean vector and covariance matrix of the linear Gaussian models are 
unknowns as well as their prior probabilities. The model proposed in this paper can also be generalized 
in the following manner. Denote t = {F u C t , H t , G t , E£, fif, Zf} = {F t , C t ,H t ,G t ,e t } and G 
a prior distribution on 9±. The following general hierarchical model 



has more flexibility than common JLS: the number of different switching models is estimated, as 
well as the parameters of these models and their prior probabilities. 

F. About observability 

In order for the observation noise w t pdf to be correctly estimated, some observability constraints 



DP(G ,a) 



x t |^,x t _i 



~ M(FtK t ^ + C t vL t + G t fJ%, Gt^G' t ) 



(58) 




HF 



(59) 




, x +n t —l 




n z are resp. the length of the state and observation vectors 
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VIII. Conclusion 

In this paper, we have presented a Bayesian nonparametric model that enables state and observation 
noise pdfs estimation, in a linear dynamic model. The Dirichlet process mixture considered here 
is flexible and we have presented two simulation-based algorithms based on Rao-Blackwellization 
which allows us to perform efficiently inference. The approach has proven efficient in applications 
- in particular, we have shown that state estimation is possible even though the dynamic and 
observation noises are of unknown pdfs. We are currently investigating the following extensions 
of our methodology. First, it would be of interest to consider nonlinear dynamic models. Second, 
it would be important to develop time-varying Dirichlet process mixture models in cases where the 
noise statistics are assumed to evolve over time. 

Appendix 

A. Notations 

H and S are sampled from a Normal inverse Wishart distribution Go of hyperparameters /xo, Kq, 
Vq, A if 

n\Y, ~ AA(/x 0) — ) 
S- 1 ~ W(u , Aq 1 ) 
where W(uq, Aq 1 ) is the standard Wishart distribution. 

B. Backward forward recursion 

The quantities P' t Zi\ t {^t:T) and Plzl\ t (^t-T) m t-i\t(^t-T) defined in Section llV-AI always satisfy the 
following backward information filter recursion. 

1) Initialization 

P^(9 T ) = HT(X™)- l H T 
P^(9 T )m' TlT (6 T ) = ifT(£»)-i(z T - /40 

2) Backward recursion. For t = T — 1..1, 

(60) 

n'|7+i(0*+^K|t+i (0t+i:t) = F? +1 (e t+1 ) x (j B . - P t / ; 1 1 |t+1 (% 1:T )s(0 t+1 )A t+1 (% 1:T ) J B T (0 t+ 

x Pt+l ]t+ i( t+l:T) (m' t+llt+1 (9 t+ i:T) ~ u' +1 (9 t+1 )) (61) 



A 



t+i 



Iriv + ^ T (^+l)- P t + l|t + l(^t+l|T)- B (^+l) 
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P' t - t \e t .. T ) = P'-^iet+v.T) + Hj^rr'Ht (62) 

P^HMKlttM = P^ l +1 (e t+ v.T)rn' At+1 {e t+l , T ) + Hj<J%)-\*t - tf) (63) 

where B(9 t ) = G t xchol(SJ') T . 
For the Metropolis Hasting ratio, we need to compute the acceptance probability only with a 
probability constant 

P(Z1:T|01:T) OCp(z t |0l :t ,Zl :t _l) / p(z t+ l :T |x t , 0t+l:T)p(x t |z lrf , 01. : t (64) 

■/AT 

If E t]t (0 1:t ) ^ then it exists n t | t (0 1: t) and Q t | t (0i :t ) such that E t | t (0i :t ) = Q t \t{0i:t)n t]t (9 l .. t )Qj lt {9 1:t ). 
The matrices Qt\t{®v.t) and n t i f (0i : t) are straightforwardly obtained using the singular value decom- 
position of £ f | t (6>i : i). Matrix Ii t \ t {9i- t ) is a n; x n t , 1 < n t < n x diagonal matrix with the nonzero 
eigenvalues of E t i t (0i : t) as elements. Then one has 

p(z 1:T |0 1:T ) OC ^(2^-1(01*), |n t | t (01:t)gJ t (01:t)P t , |;i(0 t+ l:T)Qt| t (01:t) + 4 

x exp(-|xj t (9 1:t )P^ t l 1 (9 t+1 .. T )^ tlt (9 1:t ) - 25^(01*)^ (0 m: rK |m (0 m:T ) 

-K |t+1 (0 t+ i :T ) -%(M) T x J p;^ 1 (^ +1:T )A t |,(^ 1:t ) x p;-^(% 1:T )(m; |t+1 (% 1:T ) -%(0 1:t ))) 

(65) 

where 

A t | t (01:t) = Qt\t(0l:t) [^(Ol:t) + Qjt {O t+ V. T )Qt\t (M] ^ <9j(M (66) 

The quantities x t | f (0i : t), E f | t (0i : t), z t | t _ 1 (0i : () and «St|£_i(0i : t) are, resp., the one-step ahead filtered 
estimate and covariance matrix of xt, the innovation at time t, and the covariance of this innovation. 
These quantities are provided by the Kalman filter, the system being linear Gaussian conditional upon 

01*. 
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